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Efficient  Estimation  of  a Model  with 
an  Autoregressive  Signal  with  White  Noise 

by 

Yuzo  Hosoya 

Tohoku  University 
Japan 

Abstract 

This  paper  considers  the  estimation  of  parameters  in  the  model  of 
Xj.  = + et  where  the  are  generated  by  a stationary  autoregressive 

model  and  the  nt  and  the  are  i.i.d.  random 

variables.  In  case  the  and  the  are  Gaussian,  Hosoya  (Yale  Ph.D. 

thesis,  197^),Pagano  (Ann.  Stat.,  197^)  and  Dunsmuir  (Ann.  Stat.,  1979), 
respectively,  constructed  efficient  estimates  and  gave  their  asymptotic 
distribution.  This  paper  gives  the  asymptotic  distribution  of  an  approxi- 
mate maximum-likelihood  estimate  using  only  a condition  on  the  fourth-order 
moments  of  and  and  without  the  assumption  of  normality.  This 

paper  also  contains  a theorem  which  shows  that  under  general  conditions  an 
estimate  given  by  the  second-step  in  the  Newton-Raphson  iteration  with  a 
consistent  estimate  as  an  initial  value  is  second-order  efficient  in  view 
of  C.  R.  Rao's  definition  (Rao,  J.R.S.S.B.,  1962). 


Key  words:  autoregressive  signal  plus  white  noise,  approximate  maximum- 
likelihood  estimate,  Whittle-Walker  model,  asymptotic  distribution, 
Newton-Raphson  iteration. 
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0.  Introduction. 

Suppose  that  a message  X^  has  been  received,  but  that,  due  to  noise 
in  the  channel  of  conanunication , the  original  signal  s^  cannot  be  recon- 
structed directly  from  the  observation  X^  (*a^+e^).  The  techniques  of 
so-called  signal  detection  (or  signal  extraction)  have  been  developed  for 
the  purpose  of  inferring  the  signal  sent  as  an  important  field  of  cosmuni- 
cation  theory.  [See  Whalen  (1971).]  This  same  problem  has  also  been  called, 
by  some  econometricians,  the  problem  of  "unobservable  variables".  Namely, 
they  maintain  that  the  actually  observed  quantities  do  not  necessarily  coin- 
cide with  the  corresponding  variables  in  a theoretical  framework;  thus  certain 
noise-elimination  techniques  need  to  be  applied  to  observations  when  a 
theoretical  model  is  fitted.  [See,  for  example,  Grether  and  Nerlove  (1970).] 

In  a probabilistic  framework,  this  signal-extraction  problem  has  a 

direct  connection  with  prediction  theory  in  stationary  stochastic  processes, 

and  a similar  technique  to  that  of  construction  of  the  optimal  linear  filter 

in  prediction  can  be  applied.  In  particular,  if  the  spectral  densities  of 

the  s and  the  e.  are  rational,  the  optimal  (in  the  sense  of  minimum 
t t 

A part  of  this  work  was  done  while  I was  visiting  the  Department  of 
Statistics  at  Stanford  University  during  the  fall  quarter  of  1978.  I would 
like  to  express  my  sincerest  thanks  to  Professor  T.  W.  Anderson  for  his 
reading  and  improving  the  paper,  and  also  to  Messrs.  S.  Sugihara  and  F.  Ahrabi 
for  their  pertinent  comments. 


mean-square  error)  estimate  of  s^  can  be  obtained  from  a fairly  simple 
recursive  formula.  [See  Whittle  (1963)* ] Prediction  theory,  however, 
assumes  complete  knowledge  of  the  spectral  structure  of  both  signal  and 
noise.  However,  in  practical  situations,  this  is  not  usually  the  case. 
Rather,  in  most  cases,  what  is  required  is  to  recover  the  information 
concerning  the  structure  of  the  signal  and  noise.  For  this  purpose, 
there  seems  to  be  two  statistical  treatments.  Cbe  is  to  assume  s^  to 
be  a certain  (not  necessarily  linear)  deterministic  function  of  a time 
parameter  or  of  other  parameters  and  to  apply  least-squares  or  other 
pertinent  methods  (Walker  (1969)  and  Hannan  (1971)  )•  Another  approach 
considers  the  model  of  a nondeterministic  stationary  signal.  This  paper 
explores  the  latter  approach.  The  model  which  will  be  investigated 
below  is  the  following:  Assume  that  a signal  s^  is  observed  super- 
imposed by  white  noise  , that  is. 


(1) 


» 


and  as siune  further  that  the  signal  is  generated  by  an  autoregressive 
process 


(2) 


! 

i=0 


ai  st-i 


t = • • • , —1 , 0 , 1 , 


with  cig  = 1 , where  the  and  the  are  respectively  i.i.d. , and 

mutually  independent.  Henceforth  write  a the  vector  whose  element  is  a 
The  interest  in  investigating  specifically  this  type  of  model  is 
that,  as  will  be  seen,  this  is  a special  case  of  rational  spectra.  [The 


2 


spectrum  of  the  present  model  is  formally  a rational  function  of  eiu>  , 
hut  the  parameters  of  the  denominator  and  of  the  numerator  are  functionally 
related  to  each  other.  The  usual  statistical  estimation  procedures  of 
rational  spectra  do  not  seem  to  apply  to  the  present  model  effectively. 

Though  the  author  attempted  the  application  of  Hannan's  method  of  esti- 
mation of  rational  spectra  (see  Hannan  (1970)  ) to  this  model,  it  was 
unsatisfactory.  It  seems  that  only  when  the  variance  ratio  of  and 

is  known,  can  Hannan's  method  (after  certain  modifications)  he  applied.] 

In  his  paper  Parzen  (1967)  suggests  using  the  Yule-Walker  equations 
or  the  instrumental  variable  method  to  estimate  the  parameters  in  the  model 
expressed  fay  (l)  and  (2).  Hie  method  is  consistent,  but  not  efficient. 

The  Yule-Walker  equations  can  he  derived  as  follows.  Noting  that,  in 
view  of  (l)  and  (2),  f E(X^_i  _^)  = 0 for  & = 1,  2,  ...»  p, 

an  estimate  of  the  a.  can  he  obtained  by  solving  those  equations  after 

1 N 

replacing  E(X  . X.  f)  by  the  sample  covariance  £ X.  . X.  «/N-p-A  . 
x-p"x'  t=p+Jl+l 

Walker  (i960)  observed  that  in  the  case  p = 1 the  efficiency  of  this  esti- 
mate is  near  unity  only  for  a small  or  for  a high  signal  to  noise  pro- 

portion. 

Uiis  paper  considers  the  model  given  by  (l)  and  (2),  and  establishes 
the  asymptotic  properties  of  an  approximate  maximum-likelihood  estimate 
and  an  efficient  estimate  is  also  constructed.  An  estimate  is  called 
efficient  below  when  its  asymptotic  distribution  is  normal  with  asymptotic 
covariance  matrix  equal  to  the  limit  of  the  inverse  of  the  average  Fisher 
information  matrix  when  the  process  is  Gaussian.  Another  approach  for 
constructing  an  efficient  estimate  of  the  a's  was  proposed  by  Pagano  (197*0, 


though  his  estimate  is  different  from  the  one  given  here  in  that  his  method 
does  not  use  the  likelihood  function  and  also  his  consistent  estimate  for 
the  starting  value  of  iteration  is  different  from  the  one  proposed  below. 

The  author  of  the  present  paper  has  shown  an  optimality  of  the  use  of  like- 
lihood function  to  obtain  an  efficient  estimate  of  parameters  in  time-series 
models  in  his  paper  (1977). 

The  program  proceeds  as  follows:  In  the  next  section,  an  approximate 
likelihood  function  is  given,  and  the  asymptotic  properties  of  the  approxi- 
mate maximum-likelihood  estimate  are  derived  as  a corollary  of  a more  general 
result  concerning  the  approximate  maximum-likelihood  estimation  of  a linear 
process  plus  white  noise;  the  result  is  given  in  Appendix  2 since  it  has 
an  independent  interest.  [For  the  simplicity  of  terminology,  the  value  of 
the  parameter  maximizing  an  approximate  likelihood  function  will  be  called 
below  the  maximum-likelihood  estimate.  This  will  cause  no  confusion.] 

2 

Section  2 concerns  the  construction  of  an  efficient  estimate  of  a , a 

e 

2 2 2 

and  where  0£  and  are  respectively  the  variance  of  et  and  • 

A method  that  yields  consistent  estimates  of  and  of  is  shown  and  an 

e n 

efficient  estimate  of  o will  be  given  by  the  Newton-Raphson  iteration. 
Appendix  1 to  this  paper  establishes  that  under  general  conditions  the 
second  Newton-Raphson  iteration  gives  an  estimate  which  is  equivalent  to 
the  maximum-likelihood  estimate  to  the  probability  order  1/N  , where  N 
is  the  sample  size,  whereas  efficient  estimates  in  general  are  equivalent 
to  the  maximum-likelihood  estimate  to  the  probability  order  1//H  . 

Finally,  since  this  paper  exclusively  considers  the  case  where  a 
signal  is  generated  by  an  autoregressive  scheme,  it  may  be  pertinent  to 


' 


offer  a comment  on  the  moving-average  type  signal.  Consider,  for 

example,  the  simplest  moving-average  scheme  st  = nt  + 8114  suppose 

the  X.  (=  s + e ) are  observed.  Assuming  the  same  conditions  on  e. 
t t t 

and  nt  as  previously  given,  the  spectral  density  of  is  given  by 

f(u|a,  a*,  o2)  = ± { |1  + aeia)|2  02  + a2}  , -ir  < uj  < tt  . If  a2  and 

2 2 2 

0 are  unknown,  the  values  of  a , 0 and  0~  that  give  the  same  spectral 

n £ n 

density  f as  a function  of  u>  are  not  unique.  Suppose 

|l  + aeiw|2  02  + 02  * ll  + ct*eia)|2  O*2  + o*2  , from  which  it  follows  that 

1 ' n e 1 'HE 


(1  + a2)  02  + 02  = (1  + a*2)  0*2  + 0*2 


2 * *2 

a 0^  = a a 

n n 


# # *2 

Given  a , 0 and  0 , it  is  easy  to  see  that  the  solution  of  the 

£ h 

equations  above  is  indeterminate,  even  if  there  is  the  restriction 


|a|  < 1 . 


1.  The  Likelihood  Function. 

2  2 

An  approximate  likelihood  function  for  a,  0£  and  is  derived 

here  tinder  the  assumption  that  £^  and  ri^.  are  Gaussian.  First  of  all 
the  spectral  density  of  X.  generated  by  (l)  and  (2)  can  be  written  as 


(3)  f(u)|a,  o2,  o2)  = ± 


1 


I 

J=o 


V 


ijui 


, - ir  < id  < it  , 


where  = 1 


Write  (3)  as 


ijw 


5 


for  some  c T 


Then  the  numerator  can  be  factorized  into  a 


I ./* 


J=0 


and  0 (0Q  = l)  . This  immediately  follows  from  the  Fejer-Riesz  theorem 

[for  example,  Akhiezer  (1956),  p.  152]  which  says  that  if 


(4) 


g(u>) 


f fi.  eilcw  , - ir  < u)  < n , 


k=-p 


and  g(u>)  is  real  and  nor.negative,  then  there  exists  an  h(w)  such  that 
g(u>)  = |h(u>)|^  and  h(u>)  = f . The  numerator  of  (3)  is,  as  is 


k=0 


obvious  from  its  expansion,  of  the  form  (4)  and  nonnegative;  therefore 


(5) 


f(w|a,  , o^) 


t 

2 _ Q ikw  ‘ 

i 0 £Ae 


2ir 


Z)V 


iju 


2 2 2 
where  o and  0 are  functions  of  a,  a and  a . 

e n 

Now  in  view  of  (5),  X may  be  interpreted  as  generated  by  a linear 

CO 

process  X.  = £ V,e,  , , where  the  V 's  are  obtained  from  the  equations 

k=0  k 

°o  / 

p iujk  _ „ i£(o  / _ i l(i) 

l V^e  = E^p^e  / ^jaje  > “id  where  the  e^  are  independent  random 


k=0 

variables  such  that 


(6)  Var  (e^)  = 2iTexp 


if' 

J-n 


log  f(u>)d 0) 


2irexp 


J.  [' 

2’  J.V 


log 


(<) 


♦ log 


lw 


iku 


E.a.  i> 


dm 


7 


is  analytic  and  |EctjZ^|  4 0 . Thus  Ei^Z^  - 1$ ^7? ^ 1<1  yZ*  is  analytic 
and  nonzero  on  {Z:|z|  <1+6}  for  some  6 . 

B)  For  large  N»  Qjj  can  be  approximated  by 


(9)  UN(X,  a,  a2  , o?)  = 


•.  r I N , 2 / 

~2  [ \ l V / *M«.  °e  * 

2irr  J-*  |n=l  / 


an>  d“ 


«— -L 

N I 6kC  , 
k— (N-l  ) k k 


p O 

where  a is  the  k-th  Fourier  coefficient  of  l?f(a)|a,  O , a ) and  C. 
is  the  sampling  autocovariance  of  k-th  order. 

Now  by  A)  and  B) , the  log-likelihood  function  (8)  can  be  approximated 
for  large  N by 


(10)  log  I£  (8)  = - | log  2tt o2(6)  - j ^(x.e) 


= - | log  (2tt)2  - | ~ f log  f(u>|e)dw 

J — 7f 


, f I Z X e 1 

1 f ijLn !_*„  , 

2tti  J-ir  f(w|e) 


2(2tt) 


2 2 

where  0i  = or  for  i = 1,  2 and  0p+1  = o £ , 0p+2  = , and 


(v/ljo 


Gje1^  + 0p+1  J . The  general  treatment 


of  the  asymptotic  properties  of  the  least-squares  estimate  obtained  from 


8 


maximizing  the  approximate  likelihood  function  of  the  form  (10 ) is  fur- 
nished in  Appendix  2,  and  the  following  Theorem  1 is  a straightforward 
consequence  of  that  result.  That  appendix  deals  with  the  asymptotic 
properties  of  the  least-squares  estimate  of  parameters  of  a general  linear 
process  which  is  superimposed  by  a white  noise  and  derives  them  by  means 
of  an  extension  of  the  Whittle-Walker  theorem.  [See  Whittle  (1952)  and 
Walker  ( 196U ) . ] 

For  the  model  represented  by  (l)  and  (2),  assume  the  following: 

A-l)  The  e and  rr  are  strictly  stationary  processes  with  finite 
fourth  cumulants  which  are  denoted  as  Kj^e)  and  K^(ti)  respectively. 

0 2 2 2 2 

A-2)  Let  a , , and  be  the  true  values  of  a , a£  , 

respectively.  Ihen  € A with  A a compact  subset  of  such  that. 


for  any  a € A 


all  zeroes  of 


are  outside  of  the  unit  circle. 


and 


are  respectively  in  a compact  subset  of 


Then 

Theorem  1: 

A Ap  AQ 

Let  a , a and  0 be  the  least-squares  estimates  derived  from  the 
e n 

function  (10).  Let  h(a) 1 0 ) = l/f(w|0)  . Then  ^(a  - aQ),  v¥(a^  - a^Q) 

and  y^T(o^  - o^Q)  are  asymptotically  Jointly  normally  distributed  with 
mean  vector  0 , and  with  covariance  matrix  UttW”1  + ^(eJWq^TJqW"1 
+ Ku(n)  wJ’-VqW"1  , where  WQ  , UQ  and  VQ  are  (p+2)x(p+2)  matrices 
with  the  representative  terms 


9 


I 


which,  for  practical  purpose,  can  be  approximated  by 

N-l  N-l  I(u>.) 

(11)  - £ log  , 

where  Wj  = 2irj/N  , J = 0,  1,  ....  N-l,  and  l(u>j)  = EnXne 

Let  the  quantity  (11)  be  denoted  by  A(6,X)  . The  first  derivative  of 
A(0,X)  is  nonlinear  with  respect  to  0 , so  that  a certain  approximation 


10 


is  required,  for  the  solution  of  3A(0,X)/30  - 0 . It  can  be  shown  that 
the  Newton-Raphson  iteration  procedure  generally  produces  an  estimate  as 
efficient  as  the  maximum-likelihood  estimate  if  the  iteration  starts  with 


a consistent  estimate  of  0 ; Theorem  3 of  Appendix  1 proves  that  the 

2 2 

first-step  iteration  produces  an  estimate  0 such  that  v^f(0  -0) 

O 

converges  to  0 in  probability  and  moreover  that  N(0  -0)  tends  to  0 

3 

in  probability  for  the  estimate  0 obtained  by  the  second-step  of  the 
iteration.  For  that  theorem  to  apply,  two  points  must  be  checked.  The 
one  is  whether  the  starting  value  of  0 is  consistent,  and  the  other  is 
whether  the  present  model  satisfies  the  conditions  of  Theorem  3.  Concern- 
ing the  first  point,  it  has  already  been  shown  that  the  solution  of  the 
Yule-Walker  equations  is  a consistent  estimate  of  a . Hie  starting  value 

for  a and  o can  be  constructed  as  follows:  Let  g(w|a)  = £ cl  e , 

e n k=0 

and  let  3 be  the  solution  of  the  Yule-Walker  equations.  Taking  relation 

(3)  into  consideration,  regress  2uT(u)j)  on  l/g(<*)j|3),  J = 0,  1,  ...,  N-l  . 

2 2 

Then  estimates  of  ae  and  can  be  obtained  as  the  regression  coeffi- 

cients. Namely,  calculate 


2irl(to. ) . 1 

N ^ g'^  jaT  ~ N |jj  N ^ g(o^|a7 

N ^ g(o)j  |a)  £n  ^ g(<i)j  |a)  J 


11 


Theorem  2: 


o r>  2 2 

The  o and  cn  are  consistent  estimates  of  CJ  and  a respectively 

e n e n 


Proof: 


2irf(a)|a  ,o2  ,a2) 


^ "\  f . p p 1 r 1 

**  “ 2 n J 27rf(a)l«.  V Vd“  2?  J *“ 


g(u>|a)  2ir  J g(w|a)  2ir 


g(w|a) 


2ir  J g(w|a) 


°e  ( duj  I _1_  dbi  \2  1 

2tt  L g(<i)|a)2  \ 2tt  g(u)|a)j  J 


•J2.  2 ^.2  2 

Thus  plim  a = a . In  the  same  way,  plim  5=0  . Ill 

N e E N-*»  n 1 

^ J2 

In  order  to  see  that  the  Newton-Raphson  method  with  a , 5£  and 
as  its  starting  values  provides  an  efficient  estimate,  it  is  sufficient  to 
check  that  the  approximate  likelihood  function  (ll)  with  conditions  A-l  and 
A-2  satisfies  conditions  C-l  and  C-2  of  Theorem  3.  Condition  A-2  implies 
that  f(«|0)  , 92f(w|  0)730^30 j and  1 0 ) /39^  j ^ = ••• 

p+2,  are  uniformly  continuous  with  respect  to  to  € (-lr,ir)  and  0 in  the 

. 32A(6.X)  a 33A(0.X) 


parameter  space,  Thus  convergence  of  m'^q1  39 — and  N30  30*30~  ds 

i j i j h 

straightforward.  Accordingly  conditions  C-l  and  C-2  are  satisfied. 


L2 


To  summarize  the  preceding  argument,  efficient  estimates  of  a , 

a2  and  o2  can  he  constructed  as  follows : 

e n 

1)  Solve  f o.  I X X ./n-p-a]  = 0.  A -1,2,..., 

1=0  1 t=p+Jt+l  P / J 


for  o . Let  the  solution  be  a . 


2)  Calculate  a2  and  52  by  (12)  and  (13). 
e n 


3)  Let  ejl)  = ai  , i = 1,  2,  p,  and  0p+2  = °n  md 

apply  the  following  iteration  formula: 

(n)  _ (n-1)  32A(9(n"l),X)  ~13A(9(n~l) ,X)  . 2 o 

9-9 ae  30 » 39  » n **  3»  •*** 


where  (A(9,X)  and  g(o»|a)  below  are  abbreviated  as  A and  g) 


9A  9A 


"l>  <^g)g  'Jj  (-?♦<.)*  ’ 


9A  _ 9A_ 
a0p+l  ‘ 302 


v _J_.T  I(V«  . 

ij  a2  . c2g  ij  (a2  ♦ o^g)2  ’ 


9A  _ 9A 

90  .O 

p+2  9a 


M_=  v i + y ,.I(-J)g 

30?  J o2  + o!g  J (a2  + o2g)2 


n e1 


n e' 
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32A  _ 32A 


30A3em  - aoi^ 


2Q2cos{(m-Jl)o)j}  Uo2(g2  + 2g2g) ( ^cosOc-A)^ ) ( E^cos (k-m)Mj ) 

(a2  + g2g)2  {(°*  + o2g)g}2 


r 2l(w1)g2cos(m-£)a)1  r l+l(a)1)g2(^kakcos(k-A)a)1)(^kakcos(k.-m)Wj) 

-E  - ^ J + y . ~ « -> 


j ^2  . 2 x2 

<®e  * °ne) 


l°e  * ®n«>3 


32A 


_ _ 32A 

30a36p+i  30 ^o2 


r ag^C^o^cosOt-JDajj)  ln(a)J)g2g(j;kakcos(k-«.)aJk) 
Ii  .9  9 .3  + M 


<®n  * ®e«>2 


32A  _ 32A 


3ek  s«4>2 


I 


_£_ 


* <e)2 


32A  _ 32A 


21(10,  )g3(g2  + g2g) 

y « — — 1 — 2 — 

l)  (a?  ♦ a2e)“ 


e tr 


*ep+2  3(an)2 


z 


2l(Uj7g 


Ve  ♦ a2g)2  J(g2  + a2g)3 


C 


lU 


r 2l(o)j)Xkakcos(k-A)u»j  4l(w1  )gp^(Xkakcx>s(k-A)wm) 

” I*  . O 0.0  . O 0.7 


<«£  * O^e)3 


32A 


36  ^30 

p+1  p+2 


32A 

3a2  9a2 
e n 


-X 


J(o2  ♦ o2g)2 


- I 


2l((*)j  )s 

H * &s 


32A 


32A 


3V9p+2  3a^3o^ 


2^008  (k-t)Mj  2on(^k\COB  ) 

J (o?  + a2g)2  J (o2  + a2g)2 


n e 


n e 


3.  Numerical  Examples. 

Here  are  given  two  examples  of  the  method  for  the  simulated  data 
generated  from  the  model 


(1U) 


xt  = °*6xt-l  + nt  » 


Yt  = Xt  + et  • 


where  and  are  respectively  generated  by  a normal  random-number 

generator  with  mean  0 and  standard  deviation  1 . Ihe  numerical  results 


15 


Tf 


in  Table  1 exhibit  the  estimates  obtained  from  generated  values  of  Y^ 

(sample  size  * 200);  the  first  column  in  the  table  shows  starting  values 

2 2 

(namely  consistent  estimates)  of  a , o and  o , and  the  second  column, 

t n 

the  third,  and  so  forth,  show  each  step  of  the  iteration. 


(*) 

TABLE  lv  ' 


Step  of  iteration 

1st 

2nd 

3rd 

4th 

5th 

6th 

7th 

a 

0.705 

0.703 

0.701 

0.700 

0.699 

0.697 

0.696 

< 

0.546 

0.568 

0.573 

0.576 

0.579 

0.582 

0.584 

1.286 

1.253 

1.250 

1.247 

1.244 

1.242 

1.243 

The  estimation  from  300  Y' 

s yielded  this: 

(*) 

TABLE  2V  ' 

Step  of  iteration 

1st 

2nd 

3rd 

4th 

5th 

6th 

7th 

a 

0.517 

0.523 

0.526 

0.529 

0.532 

0.533 

0.535 

°n 

1.249 

1.177 

1.165 

1.155 

1.147 

l.l4o 

1.135 

< 

0.708 

0.787 

0.798 

0.807 

0.813 

0.818 

0.823 

Obviously  the  case  of  sample  size  300  gives  better  estimation  than  200; 
but  in  both  cases  it  can  be  observed  that  the  convergence  is  very  slow. 

In  Tables  1 and  2,  the  values  in  the  1st  colum  denote  consistent 
2 2 

estimates  of  a , 0 and  a which  are  starting  values  of  iteration. 

e n 


1 6 


4 IT  —1 

The  following  Table  3 displays  the  covariance  matrices  — WQ  of 

the  estimates  of  a , O2  and  a2  for  sample  3izes  N = 200  and  300 

n e 

evaluated  by  means  of  the  asymptotic  covariance  matrix  given  in  Theorem  1, 

where  the  element  corresponding  to  the  column  a and  the  raw  denotes, 

2 

for  example,  the  covariance  of  the  estimates  of  a and  . 


TABLE 


N-200 

a 

< 

4 

N=300  a 

°n 

4 

a 

0.050 

-0.325 

0.433 

0.033 

-0.216 

0.28 9 

ajj  -0.325 

2.178 

-2.858 

-0.216 

1.145 

-1.905 

a2  0.433 

-2.858 

3.811 

0.289 

-1.905 

2.541 

APPENDIX  1 


THE  NEWTON-RAPHSQN  METHOD 

Let  L^(0)  be  a likelihood  function  of  8 5 {0^,  0g,  . 0^}  given 
observations  X^,  Xg,  ....  Xjj  and  let  Ng(0°)  = {0 : 1 1 0—0°  |[  < 6}  be  a 
certain  neighborhood  of  0°  , the  true  value  of  0 . Now  assume  the 
following. 

B-l)  log  1^(0)  is  third-order  differentiable  with  respect 
to  0i  , i * 1,  2,  . .. , q , for  0 € N(0°)  . 

= plim32log  1^(0°)  /N  30.^30  ^ exists  and  the  matrix 
N-*» 

i,  J * 1,  2,  ...»  p is  nonsingular. 

B-2)  3^1og  0 )/N 30^30 j30k  is  bounded  in  probability  uniformly  in 

0 € Nfi(0°)  . 

B-3)  There  exists  a consistent  estimate  01  [i.e.,  01  -*■  0°  in 

probability  as  N -*■  00  ] such  that  ^(O1-©0)  has  a limiting 
distribution  with  a finite  covariance  matrix. 

A 

B-U)  Let  0 be  a solution  of  the  likelihood  equations  which  is 

consistent;  then  0—0^ ) is  also  assumed  to  have  a finite 
asymptotic  covariance  matrix. 


.8 


Let  r„(8)  be  a p by  p matrix  whose  (i,j)  element  is 


32log  Ljj(e)  31og  Ljj(6) 

— gg^Q and  yn(9)  be  a P"vector  whose  i-th  element  is  jg 

1 J ^ 


e2  = e1  - TjjO1)-1  YjjCe1) 


e3  = e2  - rN(02)_1  yn(02) 


Theorem 


If  B-l)  through  B-U)  hold,  >¥(0-0)  tends  to  0 in  probability; 
in  other  words,  >¥(0  -0  ) has  the  same  limiting  distribution  as  the 
maximum-likelihood  estimate.  Furthermore,  under  the  same  conditions, 

O A 

N(0  -0)  tends  to  0 in  probability. 


Proof: 


By  the  Taylor  expansion  of  3log  L^(0)/30^  = 0 around  8 , 


3log  IvO1)  „ ~ . 

(17)  3? +L(e,-^) 


32log  LJ01) 


+ ^k(0J 


jvoj  V 30^ 


. a3iog  Lj.(0*) 

- ej)(0k  “ \)  30  30  30 


0 , i = 1,  2, 


i J k 


* 


* i > » > * 

where  the  0 is  such  that  < < for  i = 1,  2,  . 


• » i Q,  • Xu 


(17)  above. 


32i»g  ijjle1) 


99J  " V 30, 30 


i J 


(18) 


*j<ej  - 9 


2l  2log  V0l>  , T fc2  fli, s2log  Vel) 

36,39,  V J " V SOFT 


i J 


'iwJ 


,(0,  - e,) 


2 32log  ^(e1)  3log  LjjO1) 


y ) 36.30J 


30. 

i 


by  (15).  From  (17  and  (l8), 


~ 32log  LJ01) 

^6J  " V N30i30J 


(19) 


~ 33log  1^(0  ) * 

-Ij  Ik  ^ej  “ ej)  N30i30J30k  6k”  6k^ 


Writing  the  term  on  the  right-hand  side  above  as 


- 33log  Lj0#)  A 

N(0 . - 0.)  — ^(0  - 0r;)  , we  see  that,  for  0 < e < 

J J ir‘K:30i30j3ek  K 


both 


33log  1^(0  ) 


N1+€30, 30.39,, 
i J k 


€ 1 

and  N (0^  - 0^)  converge  to  0 in  probability 


A ^ 

and  »^(0j  “ 0j)  is  asymptotically  of  finite  variance  by  assumption. 


20 


hIcm 


Ihus  the  whole  quantity  on  the  right  of  (19)  converges  to  0 in  probes 

2 1 

bility.  It  i3  easy  to  see  that  9 log  Ljj(9  )/N30^30j  converges  to  . 


By  assumption,  the  matrix  (£. .)  is  nonsingular  so  that  the  »ft(0  - Q.) 


j / ab  W QVbbWOi  WW  WMMW  Wi*s>  r M \ V J W j 

tends  to  0 in  probability.  In  order  to  prove  the  second  assertion  of 


the  theorem,  note  the  following  equation: 


N(03  - 0)  = {I  - rN(02)_1rN  (0)}  N(02  - 0) 


(20) 


r»(02)"*1{  J 3rN(e**)/90k{6k  ' V}  N(02  " 0) 


»*  ~ < *»  > 2 

where  0 is  a vector  such  that  0j  > 0j  < 0^  , J = 1,  2,  ...,  p , 


3rN(0)/30k  is  a p by  p matrix  with  3 log  as  its 


(i,j)  element  and  I is  the  p by  p identity  matrix.  Then  if  N(0  - 0) 


is  bounded  in  probability,  the  first  term  on  the  right-hand  side  of  (20) 


2 —1 

converges  to  0 in  probability  since  1^(0  ) 1^(0)  converges  to  the 


identity  matrix,  and  the  second  term  tends  to  0 since 


r„(02r13r  (0  *)/30  is  asymptotically  bounded.  The  fact  that  N(02  - 0) 

H W lv 


is  bounded  in  probability  is  evident  in  view  of  the  following  equation: 


n(02  - 0)  = v¥{i  - iy©1)-1  Tn(0)}  vffiX©1  - 0) 


(21) 


i i 9rH(0)  i * -i 

- r»  (e  “3e  - V1  Me  - e) 

k 


where  0j  * 0j  * 0j  * J = 1*  2,  ...,  p , since 


1 
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APPENDIX  2 


THE  ASYMPTOTIC  PROPERTIES  OF  THE  LEAST-SQUARES  ESTIMATE 
OF  A LINEAR  PROCESS  PLUS  WHITE  NOISE 


Let  {X.  : t = ... , -1,  0,  1,  ...}  be  a stationary  process  represent- 
^ 00 

able  as  a linear  process  = ][  y^(9)et_^  , where  the  are  independent 

random  variables  such  that  E(et)  = 0,  E(e^)  = 0^(0)  and  E(e^)  = C < ® . 

1 

The  and  0£  are  functions  solely  of  0 = (0^t  Sg,  . . . , 0^) 

Suppose  that  the  process  {X^.}  has  a spectral  density  f (u>  1 0 ) with  respect 

to  the  Lebesque  measure.  Then  define  a process  {Y^}  by  Y^  = X^.  + 

where  {X^.}  is  defined  as  above,  the  r|.j.  are  i.i.d.  random  variables 

2 k 

with  mean  0 , variance  Vai^T^.)  * 0^(0)  and  E(nt)  < 00  » and  {e^} 
is  independent  of  • Set  g(w|0)  = f(w|0)  + ^ 0^(0)  • Now  assume 


the  following: 


C-l)  0U  , the  true  value  of  9 , is  in  0 , a compact  subset  of 


*1  p 12 

C-2)  g((o|0x)  cannot  be  equal  to  g(u)|0  ) i.e.,  for  0 # 0 , 

C— 3)  h(u>|0)  * l/g(u>|0)  , and  h^(w|0)  = 3h(u>|0)/30i  , i = 1,  2, 

q , are  continuous  in  (w,0)  for  |u>|  , 0 6 9 » 

and  Wq  , the  q by  q matrix  with  the  representative  term 


* h(i)(wje°)  h(j)(q)|9°] 


h(u>  1 0° ) h(w|0°) 


is  nonsingular. 


I 


■ 


c-U)  h(i»J)(u)|e)  = 32h/aeiaeJ  and  h(i,J»k)(co|0)  * 93h/3ei3eJaek 

exist  and  are  continuous  in  (u>,0)  for  |u|  rr  , and 

Ng  (6°)  * a neighborhood  of  0^  ; namely  (0°) 

- {0 : 1 1 0 - 6°  1 1 < 6^  , 

C-5)  l i ln.(e0) | < » . 
i=0 


Set 


V 


iwt 


7- 


g(u)|0)dw 


,(0)  * - [ log  f(u>|0)dh>  - \ UN(0)  . 

J -IT 


Let  0 be  a value  of  0 which  maximize  S,r(0) 


and  define  SN(0) 


Theorem  5; 

/v 

Assume  the  conditions  C-l  through  C-5.  Then  0 , the  approximate  maximum- 
likelihood  estimate  of  0 , is  consistent,  and  v^f(0  - 0 ) has  the  limiting 

distribution  N(0,  UirW”1  + K^e)  w"1^1)  + K^tOw"1^"1)  , where  WQ,  UQ, 
VQ  are  q by  q matrices  having  i,j-th  element 


f 71  h(i)(u)|0°)  h(j)(u)|0°) 
J -tt  h(u|0°)  h(w|0®) 


j 71  h(  1 } (u 1 0° ) f (u> 1 0° )du>  j J”  h(j)(to|0°)f(w|0°)dw  , 


2k 


o(e  ) r » 


f h(i)(4>|e°] 

J -7T 


a (e  ) r* 


f ' h'3'(u|ec 

J -7T 


respectively.  If  the  et  and  nt  are  Gaussian,  the  asymptotic  distribu- 
tion is  N(0,  UnW”1)  . 

Ihis  theorem  is  derived  by  applying  several  modifications  to  Walker's 
results  [1964].  For  the  arguments,  the  next  lemma  is  important.  The 
result  is  a straightforward  extension  of  the  Grenander-Rosenblatt  theorem 
[1957*  P.  137]  and  the  proof  is  omitted. 

Lemma  1; 

Let  W.(<*>) , j = 1,  2,  ... , p , be  any  bounded  even  functions  of  o» 

J 

with  at  most  a finite  number  of  discontinuities;  let  K^e)  and  K^(n) 
be  fourth  cumulants  of  et  and  nt  respectively.  Bien, 

lim  N covjj"  lK(u)Wj((o)du)  , J IN(w)Wk(w)da)| 

= l6ir2^4Tt  J g(w)2Wj(w)Wk(w)da>  + K^e)  f(w)Wj(w)dwJ  J f(o>)Wk(u>)dwJ 

♦ V>  { & V“)a“}  { /_*  V“)4“}]  • 


1 N 

where  I^w)  is  the  periodgram  of  the  Yt  , namely  In(w)  = ^ I^ve 


(I 
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1 

•• 


I 

- j 


A.  The  Consistency  of  9 . 

T-pmmfl.  2 f Walker  (1964)  p.  368; 

There  exists  a function  jj(9)  ® s*1*!  ^1*  •••»  suc^  that 

2 . 

|N_1[Ujj(02)  - UjjCe1)]!  < H5jN(0)  for  all  01#  ©2  € ©(  |1  © -6  1 1 < 6)  , 

11m  E(H-  „)  = 0 uniformly  in  N , 

6+0  6»N 

lim  Var  (H*  N)  = 0 for  each  6 
N-*»  * 


Lemma  3: 


Let 


he  the  true  value  of 


0 


« 

and  0 


be  any  other  point  in 


0 , 


then. 


£P'{» 


°)  - SN(0*)  ] > K»(0°,0*)  } = 1 


for  some  positive 


o 

K* (0  ,0  ) . 


Proof: 


lim  E { | [ Sjj(0°)  - SN(0*)]} 


N-*eo 
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. 


L 


I 


(i 


where  the  second  term  is  equal  to 


! t_  | g^le°) 

x - o_  I ». 


.A  f 

J-,r 


du 


g(a)|0  ) 


Thus 


lim  E {j  [ SN(0°)  - Sjj(0#) ] } 
N**“° 


1 

2 


* /.I  **{ SS e 1 *•  ■ * /.Iioe  HSft] } 


da> 


But  note  that,  for  any  x € R , xe  e with  equality  at  x = 1 . Hence 


where,  by  condition  2,  the  equality  does  not  hold  for  almost  all  a)  . Thus 


(22)  f log  JgH9,!  el  da>  - f log  {exp  V} 
J -it  *g(a)|0  ) > J -7T  1 g(a>|0  )* 


da)  < 0 . 


Cn  the  other  hand. 


Var  { [SH(0°)  - ^(0*)]}  * Var  { ±.  [1^(0°)  - U^(0*)]>  , 


where  the  right-side  tern  converges  to  0,  as  N -►  00  , in  view  of  I>mma  2.  Q 


Define  H.  „(0X)  as  in  Lemma  2 and  let 

o ,N 


JgtO1)  * max  f | log  gCu)!©1 

{0:||  0 - 01 1 1 < 6} 


) - log  g(a>|0)  da)  . 
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Now  put  hJ  jte1)  = h6  (e1)  + ^(e1) 


Lemma  k : 


Let  |02  - 0^|  <6  , then 


| i [ s^)  - Sjjfe1))!  < Hjye1)  , 11m  E(H*>H)  = o 


uniformly  in  N , and  lim  Var  (Hr  „)  = 0 for  any  6 . 


N-ko 


6,N' 


Proof: 


|i  [SN(02)  - SN(01)]|  < \ I §■  [UN(02)  - ^(O1)]  | + ^ f " |log  g(u)|02) 

J -IT 


-log  gCool©1)  |du>  ■<  J-  h6  ^(e1)  + ^(e1) 


In  view  of  Lemma  2,  it  suffices  to  prove  that  lim  Jj©1)  ~ 0 . 

6-0  0 


By  the  mean- value  theorem. 


/TT  2 

| log  g(u)|0  ) - log  g(w|0  )|dfa) 

-TT 


«f'l  i MdM^ii9fihU|X91*(i-»e2)|au 
J -it  k*l  ®°k 


where 


9s(^|X0^(l  ^)(?  ).  h(w|X01+(l-X)02)  are  bounded  functions  on 
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{02:  10^  - 0A|  < 6}  , by  conditions  3)  and  6).  Thus  as  6 0 


/IT  , 

I log  g(u|0  ) - log  g(o)|0)|dz*>  -*■  0 . [] 

iO  |0  - 0-|  < 61 

/S 

Now  the  consistency  of  0 follows  from  Lemma  3 and  Lenma  U,  by  almost 
the  same  steps  given  by  Walker  (196^). 

A 

B.  Asymptotic  Distribution  of  6. 


It  holds  that 


v¥(0  - 0°)  = - 


328  (0*) 

K_ 

N3030* 


asN(e°) 


where  0*  = X0  + (1-X)0°  for  some  X,  0 < X < 1 , 32SN( . )/N3030'  denotes 
the  q by  q matrix  with  elements  32SN( . J/NSO^Oj  , i,  J + 1*  2,  . , q ; 
3Sjj(.)/  v5T30  is  the  q-vector  with  elements  3Sjj(.)/  J = 

below  is  defined  in  the  same  way).  By  condition  4), 


32Sn(0*) 


N3030 


> i r 


log  h(u)|0  )dto 
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Therefore  the  following  result  holds. 


Lenina  5: 
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The  only  problem  which  remains  is  to  find  the  asymptotic  distribution 


of  -i-a 

v^f  30 


Lemma  6: 


lim  E 
N-h» 


U *»(°V  n 

[*~T  J ’ 


1*1*  2,  • • • » 0.  • 


31 


. 


Proof: 

Hie  asymptotic  normality  follows  from  a similar  argument  to  Walker 
(1964)  p.  375.  The  asymptotic  covariance  is  evaluated  by  setting 
Wj(u)  = h^(w)  in  Lemma  1.  Q 

Now  asymptotically,  it  holds  that 
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But  in  view  of  Lemma  7,  the  last  term  above  is  distributed  as 


N(0,  i-W.  + - [Kj.Ce)  Un  + K.  (n)  VJ).  Accordingly  in  view  of  (23) 

* 0 (2ir)2  4 ° 


and  Lemma  5,  *fi(d  - 0°)  is  asymptotically  distributed  as 
N(0,  4tt  W"1  + Kj^(e)  W*1  U0  W"1  + ^(n)  W*1  VQ  W'1)  . 
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